The project allows you to search images on Unsplash by using a natural word description. It is powered by OpenAI's CLIP model.
Use this notebook to search for images on Unsplash.
This notebook automatically searches for images on Unsplash and downloads them. For this you will need to have an Unsplash account and to register an application so that you can receive an API key. You can do that here: https://unsplash.com/oauth/applications.
After you have an API key, you need to create a file called .env containing the following text:
UNSPLASH_ACCESS_KEY=<your access key>
Alternatively, just modify the value of the variable unsplash_access_key below.
The search process contains two steps:
You need to provide 3 parameters for the search:
description - a natural language description of what you want to see in the photosearch_keywords - broad keywords used for initial photos-selectionsearch_count - number of photos to get for the initial selectiondescription = "a car driving in the woods"
search_keywords = "car"
search_count = 100
We need to retrieve the Unsplash API key so that we can make calls to the API.
from dotenv import load_dotenv
load_dotenv()
unsplash_access_key = os.getenv('UNSPLASH_ACCESS_KEY')
We need to first searches Unsplash for photos matching the provided keywords for the initial selection.
import os
import math
import json
from urllib.request import Request, urlopen
from urllib.parse import quote_plus
# Convert the search keywords in a format suitable for the API
query_string = quote_plus(search_keywords)
# Compute how much pages we need to fetch fromt he search results (assuming 20 photos per page)
photos_per_page = 20
pages_count = math.ceil(search_count/photos_per_page)
# Go through each search result page and store the URLs of the photos
photo_urls = []
for page in range(0, pages_count):
# Make an authenticated call to the API and parse the results as JSON
request = Request(f"https://api.unsplash.com/search/photos?page={page+1}&per_page={photos_per_page}&query={query_string}")
request.add_header("Authorization", f"Client-ID {unsplash_access_key}")
response = urlopen(request).read().decode("utf-8")
search_result = json.loads(response)
# Add each photo URL to the list
for photo in search_result['results']:
photo_urls.append(photo['urls']['raw'])
# Display some statistics
display(f'Photos found: {len(photo_urls)}')
'Photos found: 100'
After that, we need to download all the selected photos. We download the photos in parallel using 16 threads.
from urllib.request import urlopen
from multiprocessing.pool import ThreadPool
from PIL import Image
# Function used to load a photo from the API
# The photos are downloaded in a small resolution (max 500 pixels wide), because CLIP only supports 224x224 images
def load_photo(url):
return Image.open(urlopen(url + "&w=500"))
# Parallelize the download using a thread pool
pool = ThreadPool(16)
photos = pool.map(load_photo, photo_urls)
# Display some statistics
display(f'Photos downloaded: {len(photos)}')
'Photos downloaded: 100'
We can now display the photos from the initial selection. They will not yet match you natural language description.
import ipyplot
# Sort the photos by
photos_heights = [photo.height for photo in photos]
photos_with_heights = zip(photos_heights, photos)
photos_sorted = sorted(photos_with_heights, key=lambda x: x[0])
photos_sorted = [photo[1] for photo in photos_sorted]
# Display the images from the search query representing our initial selection
ipyplot.plot_images(photos_sorted, labels=[""]*search_count, max_images=search_count, img_width=100)
Now we need to process the photos with CLIP. We are basically converting them in a vector space, where they can be compared to a text description. We do the same with the description.
import clip
import torch
# Load the open CLIP model
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)
with torch.no_grad():
# Encode and normalize the description using CLIP
description_encoded = model.encode_text(clip.tokenize(description).to(device))
description_encoded /= description_encoded.norm(dim=-1, keepdim=True)
# Preprocess all photos and stack them in a batch
photos_preprocessed = torch.stack([preprocess(photo) for photo in photos]).to(device)
# Encode and normalize the photos using CLIP
photos_encoded = model.encode_image(photos_preprocessed)
photos_encoded /= photos_encoded.norm(dim=-1, keepdim=True)
Now we can compare each photo to the text description and choose the most simiar ones
# Retrieve the description vector and the photo vectors
description_vector = description_encoded.cpu().numpy()
photo_vectors = photos_encoded.cpu().numpy()
# Compute the similarity between the descrption and each photo using the Cosine similarity
similarities = list((description_vector @ photo_vectors.T).squeeze(0))
# Sort the photos by their similarity score
best_photos = sorted(zip(similarities, photos), key=lambda x: x[0], reverse=True)
# Display the best 5 photos
ipyplot.plot_images([im[1] for im in best_photos[:5]], labels=[""]*5, img_width=300)